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ESTIMATING THE PROPORTION OF FALSE NULL 
HYPOTHESES AMONG A LARGE NUMBER OF 
INDEPENDENTLY TESTED HYPOTHESES 

By Nicolai Meinshausen and John Rice 

ETH Zurich and University of California, Berkeley 

We consider the problem of estimating the number of false null 
hypotheses among a very large number of independently tested hy- 
potheses, focusing on the situation in which the proportion of false 
null hypotheses is very small. We propose a family of methods for 
establishing lower 100(1 — a)% confidence bounds for this propor- 
tion, based on the empirical distribution of the p- values of the tests. 
Methods in this family are then compared in terms of ability to con- 
sistently estimate the proportion by letting a — > as the number of 
hypothesis tests increases and the proportion decreases. This work is 
motivated by a signal detection problem that occurs in astronomy. 

1. Introduction. An example that motivated our work is afforded by 
the Taiwanese-American Occultation Survey (TAOS), which we now briefly 
describe. The TAOS will attempt to detect small objects in the Kuiper 
Belt, a region of the solar system beyond the orbit of Neptune. The Kuiper 
Belt contains an unknown number of objects (KBOs), most of which are 
believed to be so small that they do not reflect enough light back to Earth 
to be directly observed. The purpose of the TAOS project is to estimate the 
number of these KBOs down to the typical size of cometary nuclei (a few 
kilometers) by observing occultations. The idea of the occultation technique 
is simple to describe. One monitors the light from a collection of stars that 
have angular sizes smaller than the expected angular sizes of comets. An 
occultation is manifested by detecting the partial or total reduction in the 
flux from one of the stars for a brief interval when an object in the Kuiper 
Belt passes between it and the observer. Four dedicated robotic telescopes 
will automatically monitor 2000-3000 stars every clear night for several years 
and their combined results will be used to test for an occultation of each 
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star approximately every 0.20 seconds, yielding on the order of 10 tests 
per year. The number of occultations expected per year ranges from tens 
to a few thousands, depending on what model of the Kuiper Belt is used. 
Having conducted a large number of tests, it is then of interest to estimate 
the number of occultations, or the occultation rate, since this will provide 
information on the distribution of KBOs. Note that in this context we are 
not so much interested in which particular null hypotheses are false as in 
how many are. The TAOS project was further described by Liang et al. [8] 
and Chen et al. [3]. 

We will base our analysis on the distribution of the p-values of the hy- 
pothesis tests. Let {G0,0 G 0} be some family of distributions, where 9 is 
possibly infinite-dimensional and G'o(^) = t with G is the uniform distri- 
bution on [0,1]. All p- values are assumed to be independently distributed 
according to 

Pj~G0., i = l,...,n. 

If a null hypothesis is true, the distribution of its p-value is uniform on 
[0,1] and Pi ~ Gq. We suppose that neither the family {GQ(t),0 G 0} nor 
the parameter vector {9i, . . . ,6n) is known, except from the fact that Gq 
corresponds to the uniform distribution. 

The proportion of null hypotheses that are false (the fraction of occulta- 
tions in the TAOS example) is denoted by 

n 

(1) A = n-^^l{e,/0}. 

1=1 

Our goal is to construct a lower bound A with the property 

(2) P(A < A) > 1 - a 

for a specified confidence level 1 — q. Such a lower bound would allow one to 
assert, with a specified level of confidence, that the proportion of false null 
hypotheses is at least A. The global null hypothesis that there are no false 
null hypotheses can be tested at level a by rejecting when A > 0. 

Our construction is closely related to that by Meinshausen and Biihlmann 
[9], which treats the case of possibly dependent tests, but with an observa- 
tional structure that allows the use of permutation arguments that are not 
available in our case. Another estimate was examined by Nettleton and 
Hwang [10], but it does not have a property like (2). Our methodology is 
related to that of controlling the false discovery rate [1, 13], but the goals 
are different — we are not so much interested in which particular hypotheses 
are false as in how many are. However, we note that an estimate of the 
number of the false null hypotheses can be usefully employed in adaptive 
control of the false discovery rate [2] . In a modification of the original FDR 
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method, Storey [13] also estimated the proportion of false hypotheses. The 
empirical distribution of p-values was used by Schweder and Spj0tvoll [11] 
to estimate the number of true null hypotheses; the methods used there are 
different than ours and do not provide explicit lower confidence bounds. The 
methods in this paper extend a proposal of Genovese and Wasserman [7]. 
We also relate our results to those of Donoho and Jin [6]. 

2. Theory and methodology. The estimate hinges on the definition of 
bounding functions and bounding sequences. 

Let U be uniform on [0,1]. Let Un{t) be the empirical cumulative dis- 
tribution function of n independent realizations of a random variable with 
distribution U . For any real- valued function 5{t) on [0, 1] which is strictly 
positive on (0,1), define V^^s as the supremum of the weighted empirical 
distribution 

Unit) - t 

(3) Vn,5-= sup — — . 

te(o,i) o[t) 

Definition 1. A bounding function 6{t) is any real- valued function on 
[0,1] that is strictly positive on (0,1). A series f3n,a is called a bounding 
sequence for a bounding function 5{t) if, for a constant level a: 

(a) nPn,a is monotonically increasing with n; 

(b) P{Vn,5 > Pn,a) < a for ah n. 

The definition of a bounding sequence depends neither on the unknown 
proportion of false null hypotheses nor on the unknown distribution G{t) of 
p- values under the alternative. 

One is interested in the case where a proportion A of all hypotheses are 
false null hypotheses. Denote the empirical distribution of p-values by 

n 

(4) Fn{t):=n~^J2^{P,<t}. 

i=l 

Estimating the proportion of false null hypotheses can be achieved by bound- 
ing the maximal contribution of true null hypotheses to the empirical dis- 
tribution function of p-values. We give a brief motivation. Suppose for a 
moment that there are only true null hypotheses. The expected fraction 
of p-values less than or equal to some t £ (0, 1) equals, in this scenario, 
U{t) = t. The realized fraction Un{t) is, on the other hand, frequently larger 
than t. However, using Definition 1, the probability that Un{t) is larger than 
t + Pn,aS{t) is bounded by a simultaneously for all values of t E (0, 1). The 
proportion of p-values in the given sample that are in excess of the bound 
t + Pn,a^{t) can thus be attributed to the existence of a corresponding pro- 
portion of false null hypotheses and Fn{t) — t — [3n,a^{'t) is hence a low-biased 
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estimate of A. As the bound for the contribution of true null hypotheses holds 
simultaneously for all values of t S (0,1), a lower bound for A is obtained 
by taking the supremum of —t — Pn,aS(t) over the interval (0,1). A 

refined analysis shows that an additional factor 1/(1 — t) can be gained when 
estimating the proportion of false null hypotheses. 

Definition 2. Let Pn,a be a bounding sequence for 5{t) at level a. An 
estimate for the proportion A of false null hypotheses is given by 

(5) A= sup . 

te{o,i) -L-i 

This estimate is indeed a lower bound for A, as shown in the following 
theorem. 

Theorem 1. Let Pn,a be a bounding sequence for 5{t) at level a and let 
A defined by (5). Then 

(6) P(A<A)>l-a. 

Proof. The distribution of values F„ is bounded by F„,(t) < A + (1 — 
A)C/„o(t), where no = (1 — X)n and Uno{t) is the empirical distribution of uq 
independent Uniform(0, l)-distributed random variables. Thus 

(7) P(A > A) < Pf snp A + (l-A)C/ W-« ^ , 

Vte(o,i) 1-* 



(8) = P( sup (1 - X)iUno{t) -t)- (3n,am > o] 

VtG(0,l) / 

(9) =P( sup Uno {t)-t- —f3n,aS{t) > o) . 

Vte{o,i) ^0 / 

Whereas nPn,a is monotonically increasing, nf3n,a/n-o > (3no,a and the proof 
follows by property (b) in Definition 1. □ 

2.1. Asymptotic control. Instead of finite-sample control, it is sometimes 
more convenient to resort to asymptotic control. A sequence /3n,a is said 
to be an asymptotic bounding sequence if f3n,a satisfies condition (a) from 
Definition 1 and, additionally, a modified condition (b'), 

(10) limsupP(y„_5 > (in,a) < a, 

where Vn^s is defined as in (3). If we suppose that the absolute number of 
false null hypotheses nX is growing with n, that is, nA — > oo for n — > oo , then 
for an asymptotic bounding sequence, 

limsupP(A < A) > 1 — a. 
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Asymptotic control is typically useful in the following situation. For a given 
bounding function 6{t) and two sequences an, bn, consider weak convergence 
of 

(11) CLnVn^s -hn^L 

to a distribution L. Any sequence I3n,a that satisfies the monotonicity con- 
dition (a) of Definition 1 and, additionally, f3n,a ^ 0'n^{L~^{l — a) + bn), is 
thus an asymptotic bounding sequence at level a. 

As an important example, consider the bounding function 5{t) = \/t{\ — t). 
The following lemma is due to Jaschke and can be found in [12], page 599, 
Theorem 1 (18). 

Lemma 1 . Let an = ^/2r^og\ogn and bn = 2 log log n + ^ log log log n — 
^log47r. Then 

(12) an sup . . . -bn^E , 

te(o,i) VhI -*) 

where E is the Gumbel distribution E{x) = exp(— exp(— x)). 



Remark 1. The convergence in (12) is in general slow. Nevertheless, the 
result is of interest here. First, the number of tested hypotheses is potentially 
very large (e.g., 10^^ in the TAOS setting described in the Introduction). 
Moreover, the slow convergence is mainly caused by values of t that are of 
order 1/n. The expected value of the smallest p- value of true null hypotheses 
is at least 1/n and it might be useful to truncate in practice the range over 
which the supremum is taken in (5) to (1/n, 1 — 1/n). Doing so, the following 
asymptotic results are still valid, while the approximation by the Gumbel 
distribution is empirically a good fit even for moderate values of n [6]. 

Similar weak convergence results for other bounding functions can be 
found in [4] or [12]. 

2.2. Bounding functions. The estimate is determined by the choice of 
the function 5{t), the so-called bounding function, and a suitable bounding 
sequence. 

There are many conceivable bounding functions. Bounding functions of 
particular interest include: 

- linear bounding function 6{t) = t; 

- constant bounding function S{t) = 1; 

- standard deviation-proportional bounding function 6{t) = \/t{l — t). 
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The linear bounding function is closely related to the false discovery rate 
(FDR), as introduced by Benjamini and Hochberg [1]. In the FDR setting, 
the empirical distribution of p-values is compared to the linear function 
t/a. The last down-crossing of the empirical distribution over the line t/a 
determines the number of rejections that can be made when controlling 
FDR at level a. It is interesting to compare this to the current setting. In 
particular, it follows by a result of Daniels [5] that 



The optimal bounding sequence at level a is thus given for the linear bound- 
ing function by Pn,a = 1/a — l. Let A be the estimate under the linear bound- 
ing function. The estimate vanishes hence, that is, A = 0, if and only if no 
rejections can be made under FDR control at the same level. Note that the 
bounding sequence is independent of the number of observations. This leads 
to weak power to detect the full proportion A of false null hypotheses when 
the proportion A is rather high but the distribution of p-values under the 
alternative deviates only weakly from the uniform distribution, as shown in 
an asymptotic analysis below. 

An estimate under a constant bounding function was already proposed 
by Genovese and Wasserman [7]. Using the Dvoretzky-Kiefer-Wolfowitz 
(DKW) inequality, a bounding sequence is given by „ = ^^S a • 
trast to the linear bounding function, this bounding function sequence van- 
ishes for n — > 00. However, the estimate is unable to detect any proportion of 
false null hypotheses that is of smaller order than y/n. The intuitive reason 
is that the bounding function d{t) is not vanishing for small values of t. Any 
evidence from false null hypotheses, however strong it may be, is hence lost 
if there are just a few false null hypotheses. 

As already argued above, a bounding sequence for the standard deviation- 
proportional bounding function is given by 



where E is the Gumbel distribution and an,bn are defined as in Lemma 1. 
Note that the bounding sequence is vanishing at almost the same rate as 
for the constant bounding function. In contrast to the constant bounding 
function, however, the standard deviation-proportional bounding function 
vanishes for small t. It will be seen that the standard deviation-proportional 
bounding function possesses optimal properties among a large class of pos- 
sible bounding functions. 




(13) 




2.3. Asymptotic properties of bounding sequences. Faced with an enor- 
mous number of potential bounding functions, it is of interest to look at 
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general properties of bounding functions, especially the asymptotic behav- 
ior of the resulting estimates. The asymptotic properties turn out to be 
mainly determined by the behavior of 6{t) close to the origin. 

Definition 3. For every u G [0,1], let be a family of real-valued 
functions on [0,1]. In particular, 6{t) G Qi, iff: 

(a) 6{t) is nonnegative and finite on [0, 1] and strictly positive on (0, 1); 

(b) S{l-t)>6{t) for t G (0,i); 

(c) the function d{t) is regularly varying with power v, that is, 

t^o 5{t) 

Most bounding functions of interest are members of Qi, for some value 
of G [0, 1]. The constant bounding function is a member of Qq, while the 
linear bounding function is a member of Qi and the standard deviation- 
proportional bounding function is a member of Qi/2- 

It holds in general for any bounding function that bounding sequences 
cannot be of smaller order than the inverse square root of n. In particular, 
note that by Definition 1 of a bounding sequence, it has to hold for any 
t G (0, 1) that P{Un{t)-t-(3n,a5{t) > 0) < a for all n G N. Whereas nUn{t) ~ 
B(n,t) is binomially distributed with mean nt and variance proportional to 
n, it follows indeed that 

liminf n^/^/3„ q > 0. 

n— »oo ' 

Consider now bounding functions 5{t), which are members of Qi, with some 
G (^,1]. It follows directly from Theorem 1.1 (iii) in [4], page 255, that a 
more restrictive assumption has to hold in this case, namely 



(14) liminfn^~''/?„„>0 

n— ♦oo 



For ly = 1 this amounts to lim inf. „^oo > 0. The linear bounding func- 
tion is a member of Qi, explaining the lack of convergence to zero of the 
corresponding optimal bounding sequence 1/a — 1 . 

For bounding functions 5{t) G Qi, with u G [0, ^], there exists some con- 
stant c > so that c6{t)'^ > t{l — t). Hence, using Lemma 1, there exist 
bounding sequences so that 

(15) limsup - — ) (3n,a < oo. 



log log n , 

The different asymptotic behavior of the bounding sequences influences the 
asymptotic power to detect false null hypotheses, as will be seen subse- 
quently. 
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3. Power. We examine the influence of the bounding function 6{t) on 
the power to detect false null hypothesis. For simplicity of exposition, it is 
assumed that the p- values of all false null hypotheses follow a common distri- 
bution G, while p- values of true null hypotheses have a uniform distribution 
on [0, 1]. For some 7 G (0, 1), let 

A ~ n~'^ . 

A value of 7 = corresponds to a fixed proportion of false null hypotheses, 
while 7 = 1 corresponds to a fixed absolute number of false null hypotheses. 
Here all cases between those two extremes are considered. 



Bounding sequences with vanishing level. For the asymptotic analysis, 
it is convenient to let a = a„ decrease monotonically for n — > 00 , so that 
a„ — > for n — > 00. Note that — > is equivalent to PiVn^s > f3n,a„) — > 
for n — > 00. For notational simplicity, this assumption is strengthened slightly 
to 

(16) Vn,5/Pn,a„ ^0, 71 ^ OO. 

In almost all cases of interest, (16) and ^ are equivalent. To maintain 
reasonable power, one would like to avoid letting the level On vanish too fast 
as n ^ oo. For bounding functions 5{t) G Q^, with u S [0, |] it is required that 

(17) limsup Pn,a„ < oo. 



, log n , 

It follows from (15) that it is always possible to find a sequence an ^ so 
that both (16) and (17) are satisfied. If both (16) and (17) are satisfied, 
the sequence a„ is said to vanish slowly. For bounding functions 6{t) £ Qu 
with u E (1/2, 1], it will be seen below that the power is poor no matter how 
slowly the sequence an vanishes for n — > 0. 

3.1. Case 1: many false null hypotheses, 7 G [0, |). The fluctuations in 
the empirical distribution function are negligible compared to the signal 
from false null hypotheses if 7 G [0, |). Hence one should be able to detect 
(asymptotically) the full proportion of false null hypotheses in this first 
setting. 

This is indeed achieved, as long as we look for bounding functions in 
with u S [0, as shown below. If on the other hand z/ G (|, 1], one is in 
general unable to detect the full proportion of false null hypotheses. The 
proportion of detected false null hypotheses even converges in probability to 
zero for large values of 7 if is in the range (^,1]. This includes in particular 
the linear FDR-style bounding function t G Qi, which is only able to detect 
a nonvanishing proportion of false null hypotheses (asymptotically) as long 
as the proportion A is bounded from below, which is only satisfied for 7 = 0. 
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Theorem 2. Let G be continuous and let inf^g^o,!) G^'l^) = 0- ^ 
the estimate under bounding function Pn,a^{t), where 6{t) G Qi, with v E [0, 1] 
and /3n,a is a bounding sequence. If v ^ [0, ^] and an vanishes slowly, then, 
for all-i£%\), 

A p 

— ^ 1, n — > oo. 
A 

However, for v G (i, 1] and 7 G (1 — z/, i), 

A p 

— ^ U, n — > 00. 
A 

Remark 2. The case inf^g^Q 1) = corresponds to the "pure" case 
in [7]. If infjg(o i) G'(t) > 0, the results above (and below) hold if A is replaced 
by 

\=(l- inf G'(t))x. 

V te(o,i) / 

Without making parametric assumptions about the distribution G under 
the alternative, identifying A is indeed the best one can hope for. 

The message from Theorem 2 is that one should look for bounding func- 
tions in Q,y with u G [0, This guarantees proper behavior of the estimate 
if the proportion A of false null hypotheses is vanishing more slowly than 
the square root of the number of observations. 

3.2. Case II: few false null hypotheses, 7 G [|, 1). As seen above, bound- 
ing functions in Q^, with < ^ detect asymptotically the full proportion A 
of false null hypotheses if A is vanishing not as fast as the square root of the 
number of observations. 

For 7 > 1/2, no method can detect asymptotically the full proportion of 
false null hypotheses if the distribution under the alternative is fixed. For 
a fixed nondegenerate alternative, the majority of p-values from false null 
hypotheses fall with high probability into a fixed interval that is bounded 
away from zero. The fluctuations of the empirical distribution function in 
such an interval are asymptotically infinitely larger than any signal from false 
null hypotheses if 7 > 1/2, which makes detection of the full proportion of 
false null hypotheses impossible. 

It is hence interesting to consider cases where the signal from false null 
hypotheses is increasing in strength. Therefore, let G = G^^^ , the distribu- 
tion of p-values under the alternative, be a function of the number n of 
tests to conduct. The superscript is dropped in the following for notational 
simplicity. 
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Shift-location testing. It is perhaps helpful to think about G as being 
induced by some shift-location testing problem. For each test it is assumed 
that there is a test statistic Zi , which follows some distribution Tq under the 
null hypothesis iJo,i and some shifted distribution T^^ under the alternative 

Ho.i : ~ To, 

(18) 

Hi^i'.Zi ~r^„. 

In the Gaussian case this amounts, for example, to Tq =J\f{0, 1) and = 
M{fin, !)• To have an interesting problem, one needs for 7 € (^, 1) in general 
that the shift between the null and alternative hypotheses be increasing 
for an increasing number of tests; that is, //„ ^ 00 for n ^ 00. 

On the other hand, one would like to keep the problem subtle. For the 
Gaussian case it was shown by Donoho and Jin [6] that an interesting scaling 
is given by fin = \/2rTogn with r S (0, 1). In this regime, the smallest p- 
value stems with high probability from a true null hypothesis. The false null 
hypotheses have hence little influence on the extremes of the distribution. 

Instead of assuming Gaussianity of the test statistics, Donoho and Jin [6] 
considered a variety of different distributions. Under a generalized Gaussian 
(Subbotin) distribution, the density is for some positive value of k propor- 
tional to 

r^(2;) ocexpl 

The case k = 2 corresponds clearly to a Gaussian distribution; k = 1 corre- 
sponds to the double exponential case. The shift parameter is chosen then 
as 

(19) //„ = (Krlogn)^/'^ 

for some r G (0,1). Note that the expectation of the smallest p- value from 
true null hypotheses vanishes like n~^, whereas under the scaling (19), the 
median p-value of false null hypotheses vanishes like for n — > 00 with 
some r S (0, 1). In fact, consider for any member of the generalized Gaussian 
Subbotin distribution the g-quantile G~^{q) of the distribution of p- values 
under the alternative. For some constant Cg, the (7-quantile is proportional 
to 



G ^{q)(x I expf— — jdx. 



Applying rHopital's rule twice, it follows for any c and k > that 
j.^ log/a+cexp(-x'^/«^) dx ^ ^ 
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Thus, under the scahng (19), for any every q G (0,1) and positive k, the 
scahng of the q^-quantile is given by 

(20) logG~^(g) ~ -rlogn. 

With probabihty converging to 1 for n — > oo, a p- value under a false null 
hypothesis is hence larger than the smallest p-value from all true null hy- 
potheses as long as r S (0,1). For r > 1, the problem gets trivial as the 
probability that an arbitrarily high proportion of p-values under false null 
hypotheses is smaller than the smallest p-value from all true null hypotheses 
converges to 1 for n ^ oo. 

The point of introducing the shift-location model under generalized Gaus- 
sian Subbotin distributions was just to identify (20) with r G (0, 1) as the 
interesting scaling behavior of quantiles of G, the p- value distribution for al- 
ternative hypotheses. The setting (20) is potentially of interest beyond any 
shift-location model. We adopt the scaling (20) for the following discussion 
without making any explicit distributional assumptions about underlying 
test statistics. 

Theorem 3. Let A ~ n~'^ with 7 G 1) and let the distribution G of 
p-values under the alternative satisfy (20) for some r G (0,1). Let A he the 
estimate of A under a hounding function (3n,a^{t), where 6{t) G Qu with v G 
[0, ^] and (in an ^ hounding sequence for bit). Let an vanish slowly. If 

•'>i(7-i),' 

PI) Ai, 

If, on the other hand, r < -^(7 — ^), then 

(22) ^^0. 

A 

Remark 3. The analysis was only carried out for functions with v G 
[0, |] due to the deficits of the functions with u G (|,1] discussed in the 
previous section. Nevertheless, it would be possible to carry out the same 
analysis here. For 1^ = 1, one obtains, for example, a critical boundary r > 7. 

The message from the last theorem is that among all bounding functions 
in with u G [0,^], it is best to choose a member of Qi/2- Bounding 
functions in Q1/2 increase the chance to detect the full proportion A of false 
null hypotheses, as illustrated for a few special cases in Figure 1. The area 
in the (r, 7) plane where A/ A converges in probability to 1 for a bounding 
function in Q1/2 includes in particular all areas of convergence for bounding 
functions in Q^, with u G [0, |] . 
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3.3. Connection to the familywise error rate. A different estimate of A is 
obtained by controlling the familywise error rate (FWER). In particular, let 
the estimate be the total number of p- values less than the FWER threshold 
a/n, divided by the total number of hypotheses, 



a 



n 



This is an estimate of A with the desired property P{X > X) < a. Controlling 
the familywise error rate has often been criticized for lack of power. Indeed, 
in the asymptotic analysis above it is straightforward to show that the area 
in the (r, 7) plane where A/ A -^p 1 is restricted to the half-plane r > 1 (ne- 
glecting again what happens directly on the border r = 1). In comparison 
to other estimates proposed here, the familywise error rate is hence particu- 
larly bad for estimating A if there are many false null hypotheses, each with 
a very weak signal. In addition, the construct requires that p-values can be 
determined accurately down to precision a/n, which might be prohibitively 
small. In contrast, the performance of estimates of the form (5) does not 
deteriorate significantly if p-values are truncated at larger values. 

The drawbacks of the familywise error rate are a consequence of the 
stricter inference one is trying to make when controlling the familywise error 
rate. In particular, one is trying to infer exactly which hypotheses are false 
nulls as opposed to only how many false nulls there are in total. The loss in 
power is hence the price one pays for this more ambitious goal. 

3.4. Connection to higher criticism. A connection of the proposed es- 
timate to the higher criticism method of Donoho and Jin [6] for detec- 
tion of sparse heterogeneous mixtures emerges. In their setup values Pi, 
i = 1, . . . ,n, are i.i.d. according to a mixture distribution 

~(1-A)if + AG, 



3 



1 — I — I — I — I — I — I — I — I — I 

a.O 0.2 O.t M US 1,0 



0.0 0.2 0.4 



I — I — 1 — I — I — r 
M 0.1 ^i 




0.0 0.i M 0.0 



Fig. 1. For v = (left), v = 1/2 (second from left) and v = 1 (second from right), an 
illustration of the asymptotic properties of the estimate X. The shaded area marks those 
areas in the (r, 7) plane where A/A — >p 1, whereas for the white areas A/ A 0. The choice 
1/ — 1/2 is seen to be optimal. The corresponding plot for control of the familywise error 
rate is shown on the right for comparison. 
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where H is the uniform distribution and G the distribution of p-values un- 
der the ahernative hypothesis. In [6] the focus is on testing the global null 
hypothesis that there are no false null hypotheses at all, 

i/o:A = 0. 

In contrast, in this current paper we are interested in quantifying the pro- 
portion A of false null hypotheses. The proportion A of false null hypotheses, 
as defined for the current paper in (1), can be viewed as a realization of a 
random variable with a binomial distribution, nA ~ B{n, A). For the asymp- 
totic considerations of this paper, however, the distinction between A and 
A is of little importance because the ratio A/A converges almost surely to 1 
for n — > oo. 

The two goals of higher criticism and the current paper are connected. If 
there is evidence for a positive proportion of false null hypotheses with the 
proposed method, then the global null Hq can clearly be rejected. In other 
words, if one obtains a positive estimate A > with P(A > A) < a, then the 
global null hypothesis : A = can be rejected at level a. Note that the 
level is correct even for finite samples and not just asymptotically. 

The connection between the two methods works as well in the reverse 
direction if an optimal bounding function is chosen. It emerged in particular 
from the analysis above that bounding functions that are members of Q1/2 
have optimal asymptotic properties. For the particular choice of a standard 
deviation-proportional bounding function in (5i/2; let A be an estimate of A 
and let I3n,a be a bounding sequence that satisfies 

/3n,a = n-^/2(2ioglogn)i/2(i + o(l)). 

Donoho and Jin [6] are not specific about choice of a critical value for higher 
criticism. However, choosing \/n/?n,o as a critical value meets their require- 
ments. The higher criticism procedure rejects in this case if and only if the 
estimate A of the proportion of false null hypotheses is positive, 

{reject Hq : A = with higher criticism} = {A > 0}. 

If both A ~ n~'^ and A ~ n~'^ for some 7 G [0, 1], the question arises if the 
area in the (7,r) plane where 

(23) P(higher criticism rejects Hq) 1 
is identical to the area where 

(24) U 1. 

Intuitively, it is clear that it is somewhat easier to test for the global 
null hypothesis Hq : A = 0, as done in higher criticism, than to estimate the 
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precise proportion A of false null hypotheses, as done in this paper. One 
would therefore expect that the area of convergence in the (7, r) plane of (23) 
includes the area of convergence of (24). 

It is hence maybe surprising that for some cases the areas of convergence 
in the (7,r) plane of (23) and (24) agree. To illustrate the point, consider 
again the shift-location model (18) under a generalized Gaussian Subbotin 
distribution with parameter n G (0, 2) and a shift (19) of test statistics under 
the alternative. 

The area in the (7, r) plane where A/A — >p 1 is in this setting independent 
of the parameter k. The detection boundary for higher criticism, however, 
does depend on k. For the Gaussian case {k = 2) and in general for k > 1, the 
detection boundary for higher criticism is, for 7 G (1/2,1), below the area 
where A/A -^p 1. The reason for this is intuitively clear. The higher criticism 
method looks in these cases for evidence against Hq in the extreme tails of 
the distribution G; see [6]. At these points, only a vanishing proportion of all 
p- values from false null hypotheses can be found. If one is trying to estimate 
the full proportion of false null hypotheses, the evidence for a certain amount 
of false null hypotheses has to be found at less extreme points, where one 
can expect a significant proportion of p-values from false null hypotheses. 
This limits the region of convergence in the sense of (24) compared to the 
area where higher criticism can successfully reject the global null hypothesis 
ifo:A = 0. 

However, for k< \ (including thus the case of a double-exponential distri- 
bution) the two areas where (23) and (24) hold, respectively, are identical, as 
shown in Figure 2. In the white area, both higher criticism and the current 
method fail to detect (asymptotically) the presence of false null hypotheses, 
and not even the likelihood ratio test is able to reject in these cases (asymp- 
totically) the global null hypothesis Hq : A = that there are only true null 
hypotheses [6]. It is hence of interest to see that for k < 1, A/ A -^p 1 holds 
whenever the likelihood ratio test succeeds (asymptotically) in rejecting the 
global null hypothesis. 

4. Numerical examples. It emerged from the analysis above that the 
standard deviation-proportional bounding function is optimal in an asymp- 
totic sense. In the following discussion we briefly compare various bounding 
functions for a moderate number of tests, n = 1000. The setup is identical 
to the shift-location testing of Section 3.2, equation (18). For true null hy- 
potheses, test statistics follow the normal distribution AA(0, 1). For false null 
hypotheses, test statistics are shifted by an amount > and are Mi^fi, 1)- 
distributed. 

The proportion A/ A of correctly identified false null hypotheses is com- 
puted for various values of the shift parameter /i and three bounding func- 
tions. The results for 100 simulations are shown in Figure 3. The left column 
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0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

Fig. 2. Comparison between the estimate of X and detection regions under higher crit- 
icism if test statistics follow the location-shift model (18) and are distributed according 
to the generalized Gaussian Subbotin distribution with shift parameter (19). The shaded 
area in the left panel shows again the area of convergence in probability of A/A to 1 for a 
bounding function in the class Qi/2- The shaded area in the right panel corresponds to the 
region where higher criticism can reject asymptotically the null hypothesis Hq : A = for 
K < 1, including the double- exponential case. The line below marks the detection boundary 
for the Gaussian case (k = 2). 






Fig. 3. The proportion A/A of correctly detected false null hypotheses as a function of the 
separation fi. Results are shown for the standard deviation-proportional bounding function 
(top row), the constant bounding function (middle row), and the linear bounding function 
(bottom row). 



shows results for very few false null hypotheses (A = 0.01), corresponding to 
10 false null hypotheses, while results are shown in the right column for a 
moderately large number of false null hypotheses (A = 0.2). 
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0.0 0.2 4 6 8 1,0 200 500 1000 2000 5000 10000 

t n 

Fig. 4. Random samples of the weighted empirical distribution function {Un{t) — t)/5{t) 
with 5{t) = t{l — t) on the left. Various hounding sequences I3n,a as a function of n in 
log-log scale on the right: the asymptotically valid hounding sequence (solid line), and the 
hounding sequences for the intervals (0, 1) (dotted line), (1/n, 1 — 1/n) (upper dashed line) 
and (1/n, 0.01) (lower dashed line), as ohtained hy simulation. Note that the latter two are 
almost indistinguishahle. 



For very few false null hypotheses (A = 0.01), both the standard deviation- 
proportional and linear bounding functions identify a substantial proportion 
of false null hypotheses if the shift ^ is larger than about 3. The expected 
value of the largest test statistic from true null hypotheses is, for comparison 
in the current setup, at around 3.7. The constant bounding function {v = 0) 
fails to identify any of the 10 false null hypotheses even for very large shifts 
II. This is in line with the theoretical results from Section 3.2. For a mod- 
erately large number of false null hypotheses (A = 0.2), the performance of 
the linear bounding function is worse than for the other two bounding func- 
tions, as expected from the asymptotic results in Section 3.1. The standard 
deviation-proportional bounding function {u = 1/2) in both cases consis- 
tently identifies the most false null hypotheses, and the optimality of this 
bounding function is thus numerically evident for moderate sample sizes as 
well. 

For the standard deviation-proportional bounding function [v = 1/2), 
asymptotic control was proposed in (10). The result relies on convergence of 
the supremum of a weighted empirical distribution to the Gumbel distribu- 
tion. This convergence is in general slow, as already mentioned in Remark 3. 
The convergence is comparably fast, however, if the region over which the 
supremum is taken is restricted to, say, (1/n, 1 — 1/n), as observed by Donoho 
and Jin [6]. We illustrate this in the following text. Restricting the inter- 
val over which the supremum is taken in (5) to some interval (a, h) with 
< a < 6 < 1 , bounding sequences can be defined analogous to Definition 1 



PROPORTION OF FALSE NULL HYPOTHESES 



17 



by requirement (b) in Definition 1 and 

(25) (5n,a = min|/3 : P f sup ^"^^} ~ * > /?) < al . 

Bounding sequences for the interval (0, 1) satisfy (25) for every interval (a, b), 
but might be unduly conservative. Less conservative bounding sequences can 
be found conveniently by approximating the probability of sup^gj-^ ;,)([/„ (t) — 
t) /5{t) > (3 with the empirical proportion of occurrence of this event among 
a large number of simulations. This is illustrated in the left panel in Figure 4. 
Shown are five random samples of the the weighted empirical distribution 
(Unit) - t)/5{t) for n = 200 and 5{t) = yjt{l-t). Let the value f3 correspond 
to the lower bound of the gray area in Figure 4. For an interval (a, b) = 
(0, 0.4), the event supjg(„ ;,)([/„ (t) —t)/6{t) > j3 corresponds then to the event 
that a realization of a weighted empirical distribution crosses the gray area. 
The bounding sequences obtained by using 1000 simulations of the weighted 
empirical distribution are shown in the right panel in Figure 4 for various 
intervals (a, h). 

There are two main conclusions. First, one might suspect that p-values 
from false null hypotheses are mostly found in a neighborhood around zero. 
Restricting the region in (5) to such a neighborhood promises thus to capture 
all p-values from false null hypotheses while allowing for smaller bounding 
sequences. However, the numerical results suggest otherwise. The bounding 
sequence for the region (l/n,0.01) is, for example, almost indistinguishable 
from the bounding sequence for the region (1/n, 1 — 1/n), as can be seen in 
Figure 4. 

Second, the agreement of the asymptotically valid bounding sequence (13) 
with the bounding sequence that is obtained by simulation for the interval 
(1/n, 1 — 1/n) is very good even for moderate sample sizes, while the agree- 
ment is not so good for the interval (0,1). When using the asymptotically 
valid bounding sequence it is hence advisable to restrict the region over 
which the supremum is taken in (5) to (1/n, 1 — 1/n). This ensures that the 
true level is close to the chosen level a for moderate sample sizes. 

For practical applications, we hence recommend that one calculate the 
supremum in (5) over a region (1/n, 1 — 1/n) and use the standard deviation- 
proportional bounding function with the asymptotically valid bounding se- 
quence (13). The asymptotic results of the previous sections hold for this 
modified procedure. 

5. Proofs. 

Proof of Theorem 2. First it is shown that, as long as 7 G (0, ^) and 
^ ^ ^, for any given e > 0, 

(26) P(A<(l-e)A)^0, n^oo. 
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Let the empirical distribution of p- values be defined as in (4) by Fn{t) = 
J22=i ^{Pi 1^ t}- We suppose that the proportion of false null hypotheses 
is fixed at A, so that Fn{t) is a mixture Fn{t) = XGmit) + (1 — X)Urto{t), 
where Gm (t) is the empirical distribution of rii = An i.i.d. p- values with 
distribution G and Unf^{t) is the empirical distribution of ng = (1 — A)n i.i.d. 
p- values with uniform distribution U. For any t <1, 

(27) A= sup Fn(^)-^-^n,.Jit) 

tg(o,i) 

.2o^ ^ F{t)-t Fn{t)-F{t)-(5n,aJ{t) 

^ ' - l-t l-t 

-^ G{t)-t I Fr,{t)-F{t)-fin,c.J{t) 

Whereas inf(g(o,i) G"(t) =0 and, hence, sup^g^Q (G(t) — t)/(l — t) = 1, there 
exists by continuity of G{t) some ti so that {G{ti) — ti)/(l — ti) > (1 — e/2). 
Setting e = ^(1 — ti)e, it suffices to show that for every e > 0, 

P{Pn,aJ{ti) + F{ti) - Fnih) > eA) ^ 0, oo. 

Whereas Fn{h) - F{ti) = Op{n-^/^) and A ~ n'^ with 7 < |, this follows 
from the finiteness of 6{t) and, because a„ vanishes slowly, from (17). This 
completes the first part of the proof of Theorem 2. 

For the second part, it suffices to show that for u € (^, 1] and 7 G (l — u,^), 
and any e > 0, 

(30) P(A>eA)^0, n^oo. 

In this regime, the penalty /3n,a„'5(*) is asymptotically larger than the signal 
from false null hypotheses. Using the definition of A, the notation no = 
(1 — A)n and ni = An, and F„(t) = AG„^(t) + (1 — X)UnQ{t), it follows that 

P{X > eX) = P sup ■ > eA 

Vte(o,i) 

= p( - eA + (1 - A)%(2^ - > 

Vt6(o,i) J^-* J--* -L-i 

(31) <pf supA°";"'-'-eA-%m>o 

VtG{o,i) 2 l-t 

(32) +Pf sup(l-A)^^^^-^M>o 

VtG(0,l) ^ l-t 

Observe in (32) that (1 - Xy^Pn,a,^ = n/3„,,a„/no > /3„o,a„ > Pno,a„o - Thus 
(32) can be bounded by P{VnQ^5 > /9no,a„o/2)- By (16) and no —> 00 it follows 
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that (32) vanishes for n — > oo. It remains to show that (31) vanishes as well. 
Let t2 = sup{t G (0,l):G(t) < e/2}. Using Bonferroni's inequality, (31) is 
bounded by 

(33) p( sup A%^^-eA>0 

Vte(o,t2] 

(34) +Pf s.p , G^.(')-' _/^jm>o 

Vte(t2,i) 2 1-t 

Whereas (G„,(t) - t)/(l - t) < G,ii(t) for all t £ [0, 1], the first term (33) is 
bounded by P{Gni (^2) > e)i which vanishes for n — > 00 because by definition 
of t2, G{t2) < e/2 and ni = Xn 00. Using Gm (t) <l, the second term (34) 
equals zero if /3„_q,^ inf^g^^^,!) '^(0/(1 ~ > 2A. By conditions (a) and (b) 
in Definition 3, it holds that inf^g^^^,!) '^(0/(1 — t) > 0. By (14), it follows 
furthermore that /3n,a„/A — > 00 for n — > 00, which completes the proof. □ 

Proof of Theorem 3. First it is shown that for r > -^(7 — ^), 

(35) P(A < (1 -e)A) ^0, n^oo. 

Here the penalty is again asymptotically larger than the signal from false 
null hypotheses for a fixed point t £ (0, 1). However, because the signal from 
false null hypotheses is increasing in strength for larger re, the evidence for 
a certain amount of false null hypotheses can be found at decreasing values 
of t. Using the definition of A, for any t G (0, 1), \ > Fn{t) — t — f3n,an^{t) 
and, hence, for any t G (0, 1), 

A/A - 1 > (1 - GnAt)) - t - - Un,m - ^/3n,a>(t), 

where again rii = Are and reg = (1 — A)re. Choosing t„ = n~'''^'^ for some 
<r < r-i(7-i), observe that by (20) it follows that i-G(re-'"+^) = o(l). 
Hence 

A/A - 1 > (1 - G{tn,r)) - \G{tn,r) " G„,(tn,r)| 

in,T \tn,T t^no(^n,T)| Pn,an^{in,T) 

= 0(1) - Op(l) - 0(1) - Op(re^-(l/2+(r-r)/2)^ 
- 0(re^-(l/2+(r-r)i.) ^Qg^)_ 

The proof of (35) follows because 7 < ^ + z/(r — r) < ^ + 

Second, it has to be shown that P(A > eX) — > if r < ^(7 — ^). Again, 
the evidence for a certain amount of false null hypotheses would have to be 
found at decreasing values of t. However, the decrease has to be so fast in this 
regime that the signal from false null hypotheses is not captured. Using again 
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the notation ni = An and no = (1 — A)n, we find that A = sup^g^o.i) Dn,x{t), 
where 

(36) Dn,x{t) ■■= YZ^t ■ 

Choose a sequence tn,p = n~'^~P for some < p < -^(7 — ^) — r. The regions 
(0) tn,p\ and {tn,pi 1) are considered separately for the fohowing. In particular, 
it is shown that both P(suptg(o,t„ Dn,x{t) > eA) and P(supig(t^^ y -D„,a(0 > 
eA) vanish for n 00. For t > tn,p, it holds that 



P sup Dn,x{t)>e\ 

VtG(t„,p,l) 

<P( sup A+ (1-A)%(^ M>o 

sup (i_A)M!)^_^i(^>o 
\teitn,p,i) 2 1-t 

+ l( sup A-^ii^>0 

(37) =P( sup (C/„„(i)-t)-^^<5(t)>0 

(38) +l( inf ^ii^<Al. 

By (16) and because n(3n,a„ is monotonically increasing, (37) vanishes 
for n 00. For (38), because 6 £ Q^, there exists a constant c so that 
infte(t„,p,i)5(tn,p) > cn-'^(^+''). It follows by r + p< ^(7 - i) that 
inftG(t„ p,i) /3n,a„5(in,p)/A ^ 00 for n ^ 00, which completes the first part 
of the proof. 

It remains to show that P(supjg(o,i„ p] Dn,x{t) > eA) ^ for n ^ 00. It 
holds that 



Pi sup Dn,x{t)>£^ 

\te{0,tn.p] 

(39) <P( sup (1-A)%fil^-A,,M>£A 

,40) +P( sup .^^)_^M>£, 

(41) +l| sup A^j^>|A 
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As already argued above, the probability on the right-hand side of (39) 
vanishes for n ^ oo. The probability (40) clearly likewise vanishes and it 
remains to show that (41) vanishes as well for n ^ oo. Whereas t„,^p — > 0, it 
holds that (1 -t)~^ < 2 for t £ {0,tn,p] and large enough values of n. The 
term (41) vanishes hence if G{tn,p) < |. This is equivalent to logG~^(|) < 
— (r + p) logn, and the claim follows from property (20). □ 
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